-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Support for sample_weight parameter in LogisticRegression #3572
[REVIEW] Support for sample_weight parameter in LogisticRegression #3572
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ looks very clean, most comments were on the Python side (also looks clean, mostly minor stuff)
This PR requires a RAFT PR to be approved and merged for CI to pass : raft:#168 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just 2 very small things, otherwise it looks great
@@ -126,6 +126,15 @@ class LogisticRegression(Base, | |||
If False, the model expects that you have centered the data. | |||
class_weight: None | |||
Custom class weighs are currently not supported. | |||
class_weight: dict or 'balanced', default=None | |||
By default all classes have a weight one. However, a dictionnary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default all classes have a weight one. However, a dictionnary | |
By default all classes have a weight one. However, a dictionary |
if class_weight == 'balanced': | ||
self.class_weight_ = 'balanced' | ||
else: | ||
classes = list(class_weight.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but my comment applies more so that the classes that the dictionary of weights has (as keys) have to coincide with the classes of the operation self.classes_ = cp.unique(y_m)
, no? i.e if the user passes a class in fit
that was not in the dict of class_weights
(when it was passed as a dict) it would be an error, which would be consistent with what Scikit does if I’m not mistaken https://github.com/scikit-learn/scikit-learn/blob/95119c13af77c76e150b753485c662b7c52a41a2/sklearn/utils/class_weight.py#L67
@dantegd In my understanding there's two cases:
Is this right, or am I mistaken somewhere? If I'm right, thanks for noticing this and providing the link, I'll fix this. |
@viclafargue yes indeed that's the case and thanks for writing it more clear than I did :). For reference we can see how that works in scikit indeed: >>> from sklearn.linear_model import LogisticRegression
>>> import numpy as np
>>> weights = {0: 0.2, 1:0.4}
>>> model1 = LogisticRegression(class_weight=weights)
>>> a = np.random.rand(8).reshape(4,2)
>>> a
array([[0.05872595, 0.7262419 ],
[0.78006228, 0.47405287],
[0.52005636, 0.63693384],
[0.66651034, 0.18605195]])
>>> b = np.array([0, 0, 0, 1])
>>> model1.fit(a, b)
LogisticRegression(class_weight={0: 0.2, 1: 0.4})
>>> model1.__dict__
{'penalty': 'l2', 'dual': False, 'tol': 0.0001, 'C': 1.0, 'fit_intercept': True, 'intercept_scaling': 1, 'class_weight': {0: 0.2, 1: 0.4}, 'random_state': None, 'solver': 'lbfgs', 'max_iter': 100, 'multi_class': 'auto', 'verbose': 0, 'warm_start': False, 'n_jobs': None, 'l1_ratio': None, 'n_features_in_': 2, 'classes_': array([0, 1]), 'coef_': array([[ 0.04954003, -0.10064803]]), 'intercept_': array([-0.38777167]), 'n_iter_': array([6], dtype=int32)}
>>> c = np.array([0, 2, 0, 1])
>>> model1.fit(a, c)
LogisticRegression(class_weight={0: 0.2, 1: 0.4})
>>> model1.__dict__
{'penalty': 'l2', 'dual': False, 'tol': 0.0001, 'C': 1.0, 'fit_intercept': True, 'intercept_scaling': 1, 'class_weight': {0: 0.2, 1: 0.4}, 'random_state': None, 'solver': 'lbfgs', 'max_iter': 100, 'multi_class': 'auto', 'verbose': 0, 'warm_start': False, 'n_jobs': None, 'l1_ratio': None, 'n_features_in_': 2, 'classes_': array([0, 1, 2]), 'coef_': array([[-0.13812798, 0.08768523],
[ 0.00804634, -0.10698594],
[ 0.13008164, 0.0193007 ]]), 'intercept_': array([-0.25710905, -0.26129142, 0.51840047]), 'n_iter_': array([6], dtype=int32)}
>>> d = np.array([0, 2, 0, 2])
>>> model1.fit(a, d)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py", line 1407, in fit
fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in __call__
if self.dispatch_one_batch(iterator):
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py", line 665, in _logistic_regression_path
class_weight_ = compute_class_weight(class_weight,
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/sklearn/utils/validation.py", line 73, in inner_f
return f(**kwargs)
File "/home/galahad/miniconda3/envs/ns0317/lib/python3.8/site-packages/sklearn/utils/class_weight.py", line 68, in compute_class_weight
raise ValueError("Class label {} not present.".format(c))
ValueError: Class label 1 not present.
>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@viclafargue to merge could you check that we're pinning raft to the latest possible commit, and add an xfail
to the non monotonic sil score test if it hasn't been done already in branch-0.19
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #3572 +/- ##
===============================================
+ Coverage 80.87% 81.00% +0.13%
===============================================
Files 228 228
Lines 17630 17811 +181
===============================================
+ Hits 14258 14428 +170
- Misses 3372 3383 +11
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@gpucibot merge |
Closes #3559
This PR adds the
sample_weight
andclass_weight
parameters to theLogisticRegression
estimator.